Search engine and digital rights management

ABSTRACT

DRM-encrypted content is opened up to “trusted search”, without compromising copyright control, thus allowing end users to locate DRM-encrypted content alongside upon unencrypted content. The indexer (or crawler) ( 216 ) of a search engine ( 214 ) is provided with a DRM module ( 302 ) for communication with a DRM server ( 306 ) so that the indexer ( 216 ) can access even the encrypted content nominally as if it were a human end user of the content. The indexer ( 216 ) may be issued with a DRM-recognized “identity” so as to distinguish itself from other end users and DRM-enabled search engines. Thus, the search engine ( 214 ) can programmatically access the content, subject to being able to obtain permission from the DRM solution.

This application is the US national phase of international applicationPCT/GB01/02826, filed in English on 26 Jun. 2001, which designated theUS. The entire contents of this application is incorporated herein byreference.

The present invention is in the field of search engines and digitalrights management. The present invention has particular applicability tosearching for content in a DRM environment where the content isencrypted.

If there is to be a viable commerce based upon the electronicdistribution of valuable multimedia content (such as for examplereports, images, music tracks, videos, etc.), then there must be somemeans of enforcing and retaining copyright control over the electroniccontent. There is now emerging a set of hardware and software solutions,generically known as digital rights management (DRM) solutions, that aimto provide this copyright control while, to a varying degree, alsoenabling new commercial models suited to the Internet and electronicdelivery. Common to virtually all these solutions is the requirementthat the multimedia content files be distributed within a persistenttamperproof encryption wrapper (the idea being that a million copies ofencrypted content is no more valuable than one). Very simply, DRM worksby carefully providing the consumers of this encrypted content withsecret decryption keys that provide temporary access to the content forsome controlled purpose, e.g. viewing, printing, playing, etc. withoutever providing access to the raw decrypted content that could be usedfor unauthorised reuse or redistribution.

FIG. 1 illustrates schematically an overview of how typical DRM systemswork. Referring to FIG. 1, a “publisher” of digital content seals theirdigital content files, buffers or streams within a layer of encryptionand digital signatures into a DRM-encrypted content format 102. Theencryption makes it difficult for malicious consumers to obtain accessto the raw decrypted content (and make unauthorised copies forredistribution). The digital signatures prevent malicious consumers fromtampering with the encrypted content (perhaps to pass off the content astheir own) by enabling the DRM system to detect the smallest change tothe encrypted content. The DRM-encrypted content 102 can then bedelivered to consumers via any electronic distribution medium 104, e.g.web, ftp, email, CD-ROM, etc. The publisher need not worry aboutprotecting the DRM-encrypted content 102 in transit to the consumersince it is inherently protected by its encryption layer and digitalsignatures.

Less sophisticated DRM systems sometimes bundle individual consumeraccess rights with the content, either within the encryption layer or atleast protected by the digital signatures. The advantage of bundlingrights with the content is that the consumer can obtain both the contentand the rights at the same time. Disadvantages include extremeinflexibility in the rights management policies that can be implementedand an enormous versioning problem (since there needs to be a separateversion of encrypted content 102 for each consumer and a new version ofthe encrypted content whenever the rights change).

More sophisticated DRM systems deliver the rights separately from thecontent (from a DRM server 108). The rights are encoded in someelectronic format 110 (i.e. electronic “rights”) and specify thepermitted relationship between consumers and DRM-encrypted content sets(and subsets), e.g. which content the consumer can access, what they arepermitted to do with it (e.g. printing), and for how long.

A specialised viewer (the DRM client 106) resident on the consumerdevice is required to obtain, manage and interpret the rights,temporarily decrypt the encrypted content and view/play it within asecure environment (so that the consumer cannot obtain access to the rawdecrypted content or the decryption keys) subject to the restrictionsimplied by the consumer's rights (e.g. view but do not print adocument). The DRM server 108 is responsible for issuing rights torequesting DRM clients 106. Current DRM systems typically issue rightsto authenticated consumers at the time of purchase (or grant) and therights are transferred to permanent storage on the consumer device 106.

In general, “content sets” can be thought of as a related set of one ormore digital content files, buffers or streams. In general, “rights” canbe thought of as an electronic description (explicit or by implication)of the association between consumers (or consumer devices) andDRM-protected content sets. Rights can optionally specify means ofidentifying the consumer (or consumer device) to which the rights‘belong’; means of identifying the content sets and subsets to which therights apply; encryption keys and checksums (cryptographic orotherwise); and the specific access rights granted to the consumers(and/or their consumer devices) over those content sets (e.g. whether ornot the consumer can print a document, the duration of access, etc.).Rights can be encoded in any machine-readable form (e.g. parsablelanguages, specialised data structures, etc.) and are used internally bythe DRM system to grant, deny or meter consumer access to encryptedcontent.

A problem raised by the proliferation of DRM-encrypted content is thatthe search engines that are typically used by Internet users to locatecontent on the Internet cannot pierce this encryption layer to buildtheir indexes and therefore DRM-encrypted content will be difficult tolocate. Search engines come in many varieties but the basic concept issimple: they are either directed to a site or follow links to a site andbuild complex indexes for the content that end users can subsequentlyuse to locate content by specifying free text or more specialisedsearches. Search engines are used to index content from across theentire Internet (e.g. Lycos, Excite, etc.) or locally on a particularsite (e.g. the Microsoft search engine that performs free text searchacross Microsoft's site and internal knowledge repositories).

FIG. 2 illustrates schematically how a typical search engine operates. Aserver computer 202 controls access to both DRM-encrypted content 204and to normal unencrypted content 206. In either case (DRM-encryptedcontent 204 or unencrypted content 206), the server 202 provides thecontent (designated by reference numeral 212 in FIG. 2) via a network208 (e.g. the Internet or an intranet) to a search engine 214. Thesearch engine 214 is operating under the control of an indexer 216 (or“crawler”) that sends requests 210 to the server 202. The search engine214 parses the received content 212 and adds processed data to asearchable free text index 218. Because the search engine 214 is unableto parse the received content 212 that corresponds to DRM-encryptedcontent 204, the DRM-encrypted content 204 is not indexed.

According to a first aspect of the present invention, there is provideda search engine that parses content encrypted by a digital rightsmanagement (DRM) system to produce a searchable index of theDRM-encrypted content, the search engine comprising: a DRM module thatcommunicates with a DRM system to obtain access rights in order to beable to decrypt DRM-encrypted content for purposes of indexing; and, anindexer that parses the decrypted content to produce a searchable index.

According to a second aspect of the present invention, there is provideda method of producing a searchable index using a search engine thatparses content encrypted by a digital rights management (DRM) system toproduce a searchable index of the DRM-encrypted content, the methodcomprising the steps of: a DRM module of the search engine communicatingwith a DRM system to obtain access rights in order to be able to decryptDRM-encrypted content for purposes of indexing; and, an indexer of thesearch engine parsing the decrypted content to produce a searchableindex.

According to a third aspect of the present invention, there is provideda digital rights management (DRM) system, the system comprising: a DRMserver that maintains location information for DRM-encrypted contentmanaged by the server; the DRM server being adapted to communicatelocation information to a DRM-enabled search engine configured to indexthe DRM-encrypted content, to provide for a unified, trusted search overthe DRM-encrypted content managed by the DRM server.

According to a fourth aspect of the present invention, there is provideda method of providing for a unified, trusted search over digital rightsmanagement (DRM) encrypted content managed by a DRM server thatmaintains location information for DRM-encrypted content managed by theserver, the method comprising the step of: the DRM server communicatinglocation information to a DRM-enabled search engine configured to indexthe DRM-encrypted content thereby to provide for a unified, trustedsearch over the DRM-encrypted content managed by the DRM server.

According to a fifth aspect of the present invention, there is provideda digital rights management (DRM) system, the system comprising: a DRMserver that maintains location information for DRM-encrypted contentmanaged by the DRM server; the DRM server being adapted to issue aDRM-enabled search engine with the rights to index DRM-encrypted contentmanaged by the DRM server and to direct a said DRM-enabled search engineto the location of the DRM-encrypted content by the DRM server.

According to a sixth aspect of the present invention, there is provideda method of enabling a search engine to index digital rights management(DRM) content managed by a DRM server that maintains locationinformation for DRM-encrypted content managed by the server, the methodcomprising the steps of: the DRM server issuing the DRM-enabled searchengine with the rights to index DRM-encrypted content managed by the DRMserver and directing the DRAM-enabled search engine to the location ofthe DRM-encrypted content.

According to a seventh aspect of the present invention, there isprovided a digital rights management (DRM) system, the systemcomprising: a DRM encryption toolkit arranged to produce an index ofunencrypted content before encrypting the unencrypted content to produceDRM-encrypted content, and arranged to issue the index and location ofthe DRM-encrypted content to a search engine to enable a said searchengine to merge the index and location of the DRM-encrypted content intoa complete search engine index.

According to an eighth aspect of the present invention, there isprovided a method of producing a searchable index of digital rightsmanagement (DRM) encrypted content, the method comprising the steps of:a DRM encryption toolkit producing an index of unencrypted contentbefore encrypting the unencrypted content to produce DRM-encryptedcontent such that a search engine can merge the index and location ofthe DIM-encrypted content into a complete search engine index.

According to a ninth aspect of the present invention, there is provideda digital rights management (DRM) system including an indexingcapability, the system comprising: a DRM encryption toolkit arranged toproduce an index in an interoperable index format capable of beingmerged into a complete index of at least one search engine.

According to a tenth aspect of the present invention, there is provideda digital rights management (DRM) data structure comprising publiclyaccessible data, placed in with DRM-protected information but visiblethrough a DRM encryption layer, that is descriptive of the contents ofthe DRM-protected information and that is to be indexed.

According to an eleventh aspect of the present invention, there isprovided a storage medium having stored thereon/therein a data structureas described above.

According to a twelfth aspect of the present invention, there isprovided a method of protecting information by digital rights management(DRM), the method comprising the steps of: protecting the informationusing DRM; and, placing publicly accessible data in with theDRM-protected information such that the publicly accessible data isvisible through the DAM encryption layer, the publicly accessible databeing descriptive of the contents of the DRM-protected information andbeing indexable by a search engine.

In accordance with a preferred embodiment of the present invention,DRM-encrypted content is “opened up” to “trusted search”, withoutcompromising copyright control, thus allowing end users to locateDRM-encrypted content alongside open unencrypted content. The indexer(or crawler) of a search engine is provided a DRAM module forcommunication with a DRM server so that the indexer can access even theDRM-encrypted content nominally as if it were a human end user of thecontent. The indexer may be issued with a DRM-recognised “identity” soas to distinguish itself from other end users and other DRM-enabledsearch engines. Thus, the search engine can programmatically access thecontent, subject to being able to obtain permission from the DRM system.

Embodiments of the present invention will now be described by way ofexample with reference to the accompanying drawings, in which:

FIG. 1 illustrates schematically an overview of conventional digitalrights management (DRM) systems;

FIG. 2 illustrates schematically an overview of a conventional searchengine;

FIG. 3 illustrates schematically the use of an example of a“DRM-enabled” search engine in accordance with an embodiment of thepresent invention;

FIG. 4 illustrates schematically an example of how a DRM-enabled searchengine may interoperate with a DRM service bureau in accordance with anembodiment of the present invention;

FIG. 5 illustrates schematically an example of a DRM system in whichindexing is done at the time of encryption in accordance with anembodiment of the present invention; and,

FIG. 6 illustrates schematically an example of the use of an incrementalsearch index exchange format in accordance with an embodiment of thepresent invention.

In accordance with the preferred embodiment of the present invention,DRM-encrypted content is “opened up” to “trusted search”, withoutcompromising copyright control, thus allowing end users to locateDRM-encrypted content alongside open unencrypted content.

FIG. 3 illustrates schematically an example of an embodiment of thepresent invention. Like the prior art FIG. 2 embodiment, a request 210is made to access particular content available via a network 208 whichmay be or include the Internet, an intranet or other network. Theindexer (or crawler) 216 of the search engine 214 has a DRM module 302for communication with a DRM server 306 so that it can access thecontent nominally as if it were a human end user of the content. In someembodiments, the indexer 216 is issued with a DRM-recognised “identity”so as to distinguish itself from other end users and DRM-enabled searchengines. Thus, the search engine 214 can programmatically access thecontent, subject to being able to obtain permission from the DRMsolution. That is, where a human would use a secure DRM-controlledviewer to read or play encrypted content, the “crawler” 216 is providedwith the secure DRM-controlled module 302 to be able to programmaticallyaccess and index encrypted content.

When the crawler 216 attempts to index the encrypted content within thecontent 212, the crawler 216 is granted or denied access to theencrypted content depending upon whether the content owner has indicatedto the DRM server 306 to authorise the DRM module 302 to associate thecrawler 216 with the rights to access and index that encrypted content.For most DRM solutions, this authorisation involves the crawler 216requesting and receiving from the DRM server 306 a cryptographic key 304for a given content file from the DRM solution, temporarily decryptingthe file, and then performing its normal indexing procedure as if thefile had never been encrypted. According to one embodiment, rightsgranted may be revoked. For example, a content owner may want to revokeaccess to a specific search engine.

FIG. 4 illustrates schematically how a DRM bureau may interoperate withDRM-enabled search engines. That is, some DRM solutions are operated ina bureau fashion, by which it is meant that a third party organisationprovides a DRM-based service that publishers can use to securely ispublish their content 401 to DRM-enabled consumers 106 without needingto support their own potentially expensive DRM infrastructure (e.g.rights-clearing servers, secure repositories, etc.). In most DRMarchitectures, this amounts to a network-accessible DRM server 306 thathandles the “behind-the-scenes” rights management traffic (e.g. buyingrights, storing rights, serving rights, etc.) that arises as consumerspurchase and use DRM-encrypted content 212. A DRM encryption toolkit 402is used to seal content within a layer of encryption and digitalsignatures. The DRM encryption toolkit 402 obtains the encryption keysfrom the DRM server 306 and seals metadata in with the encryptedcontent, for example identifying the item of content and its membershipwithin defined content sets and categories and providing links to theDRM server 306 and web pages pertaining to the DRM-encrypted content.Some of this metadata is provided directly to the DRM encryption toolkitand some is obtained at the time of encryption from the DRM server 306.

Depending upon how the DRM bureau server operates, it may over timeaccumulate a large amount of information about where encrypted files arelocated. By way of example, the DRM encryption toolkit 402 communicateswith the bureau DRM server 306 and can record where the resultantDRM-encrypted file is placed. The secure viewers executing within theclient computers 106 generate logging information when the DRM-encryptedcontent is accessed which can include the location of the DRM-protectedcontent. Using trusted search capabilities such as was described abovewith reference to FIG. 3, a DRM bureau server 306 can therefore providea unified, trusted search over the set of DRM-encrypted content 212managed by that DRM server 306. This may, for example, be by operating aDRM-enabled search engine 404, or directing a third-party DRM-enabledsearch engine 408 to the content files 212. This search capability istypically available to all consumers accessing encrypted content viathat DRM bureau and provides a unified search across all the encryptedcontent from participating publishers.

The DRM encryption functionality 402 uses hardware or softwarecomponents to actually encrypt the content files 401. The files areeither encrypted by the publisher or on their behalf by a DRM serviceprovider. Often, but not always, additional information is embeddedalongside the content and within the encryption, e.g. information aboutthe content, the publisher, where to go to obtain access rights,abstracts, watermarks, etc.

FIG. 5 illustrates schematically an example of a system in whichindexing is done at the time of encryption. Namely, the DRM encryptiontoolkit 402 has access to the “raw” content 401 prior to encryption sothe toolkit 402 can itself integrate the search engine's indexingtechnology and generate an index record 502 for the content file 401 aspart of the encryption process. This index record 502, together with anindication of the location of the encrypted file, is provided to thesearch engine 504 to be merged into a complete index. This is a usefulconcept since the final part of the electronic publishing process couldbe viewed (see reference numeral 502) as (i) index, (ii) encrypt, (iii)publish to accessible location, and (iv) send index record plus locationto be merged into full search engine index. This reduces the rather hitand miss approach of publishing content and then endeavouring to attractthe attention of search engines to come and index recently publishedcontent.

The FIG. 5 embodiment can be generalised to both the case of a DRMbureau that either encrypts the publisher's content on their behalf orprovides tools to allow the publisher to encrypt their own content tomake use of the bureau's DRM solution. In both cases, the encryptiontool can, as described above, index the content file prior to encryptionand either transmit the index information (a) to third party searchengines, or (b) to the bureau for consolidation prior to transmission onto third party search engines (dramatically increasing indexingthroughput), or (c) to a “unified” bureau search engine as describedabove.

In another embodiment, the publishers can add publicly accessible data(e.g. abstracts) to DRM-protected content, either at the time thecontent is encrypted by the DRM encryption toolkit 402 or afterwards,that is outside the encryption wrapper and can be indexed by searchengines that have not been DRM-enabled. This data would be chosen to bedescriptive of the DRM-protected content so that, when indexed by asearch engine, it will lead the search engine to the DRM-protectedcontent it describes.

In yet another embodiment, the publicly accessible data is protectedfrom tampering by digital signatures, so that if the data is modifiedthe DRM-enabled search engines 404 and DRM clients 106 can detect thistampering and ignore the tampered data.

In yet another embodiment, the publicly accessible data added to theDRM-encrypted content is metadata or keywords chosen by the publisher tocorrespond to terms sent to search engines, similar to meta tags in HTMLpages.

In yet another embodiment, the publicly accessible data added to theDRM-encrypted content is automatically generated using search indexingtechnology which automatically generates a sequence of keywords from theDRM-protected content that is indexable by other search engines but fromwhich the original DRM-protected content cannot be reconstituted.

FIG. 6 illustrates schematically how multiple incremental indexes 602resulting from the integration of search engine indexing technology intopublishing components such as the DRM content encryption component areconsolidated by a search engine merge function 604 into a “complete”index that is usable by the search engine to answer end user searchrequests. In one embodiment, the multiple search engines exchange andconsolidate incremental search indexes in an interoperable electronicfile format. The incremental search indexes have sufficient informationto be efficiently merged into the complete index used for the actualsearch.

Embodiments of the present invention have been described with particularreference to the examples illustrated. However, it will be appreciatedthat variations and modifications may be made to the examples describedwithin the scope of the present invention.

1. A DRM-enabled search engine for locating digital rights management(DRM)-encrypted content available via a network, the DRM-enabled searchengine comprising: a crawler that issues requests over a network to oneor more servers to access DRM-encrypted content and receives, in return,DRM-encrypted content for purposes of indexing said content, the crawlerhaving a DRM identity that can be recognized by a DRM system todistinguish itself from end users and other DRM-enabled search engines;and a DRM module that communicates the DRM identity of the crawler tothe DRM system to conditionally obtain access rights comprising one ormore decryption keys from the DRM system for the received DRM-encryptedcontent based on the DRM identity, the DRM module being adapted to usethe access rights to temporarily decrypt said received DRM-encryptedcontent for purposes of indexing said content, wherein the crawlerparses the received content previously decrypted by the DRM module toindex said content, produces processed data based on said decryptedcontent, and adds said processed data to a searchable index, wherein auser of the DRM-enabled search engine is able to locate the indexedDRM-encrypted content on the network from outside of the DRM system byperforming a search query on the searchable index, and wherein a contentowner is able to revoke rights to access and index content from aspecific DRM-enabled search engine using the DRM identity of thecrawler.
 2. A DRM-enabled search engine according to claim 1, whereinthe access rights obtained by the DRM module are limited to thoserequired for indexing the DRM-encrypted content.
 3. A DRM-enabled searchengine according to claim 1, wherein the DRM module is issued the accessrights to index a subset of available DRM-encrypted content.
 4. ADRM-enabled search engine according to claim 1, wherein the accessrights are selectively granted and revoked.
 5. A method of producing asearchable index for use in locating digital rights management(DRM)-encrypted content available via a network, the method comprising:a crawler of a DRM-enabled search engine issuing a request over anetwork to one or more servers to access DRM-encrypted content andreceiving, in return, DRM-encrypted content for purposes of indexingsaid content, the crawler having a DRM identity that can be recognizedby a DRM system to distinguish itself from end users and otherDRM-enabled search engines; a DRM module of the search enginecommunicating the DRM identity of the crawler to the DRM system toconditionally obtain access rights comprising one or more decryptionkeys from the DRM system for the received DRM-encrypted content based onthe DRM identity and temporarily decrypting said DRM-encrypted contentfor purposes of indexing said content; the crawler parsing the receivedcontent previously decrypted by the DRM module to index said decryptedcontent, producing processed data based on said content and adding saidprocessed data to a searchable index; and enabling a user of theDRM-enabled search engine to locate the indexed DRM-encrypted content onthe network from outside of the DRM system by performing a search queryon the searchable index, wherein a content owner is able to revokerights to access and index content from a specific DRM-enabled searchengine using the DRM identity of the crawler.
 6. A method according toclaim 5, wherein the access rights issued to the DRM module are limitedto those required for indexing the DRM-encrypted content.
 7. A methodaccording to claim 5, wherein the DRM module is issued the access rightsto index a subset of available DRM-encrypted content.
 8. A search engineaccording to claim 5, wherein the access rights are selectively grantedand revoked.
 9. A digital rights management (DRM) system, the systemcomprising: a DRM-enabled search engine according to claim 1, whereinthe DRM system comprises a DRM server that maintains locationinformation for DRM-encrypted content managed by the server, the DRMserver being programmed to communicate location information to theDRM-enabled search engine in order to provide for a unified, trustedsearch over the DRM-encrypted content managed by the DRM server.
 10. Asystem according to claim 9, wherein the DRM server is adapted tocommunicate the location information to said DRM-enabled search engineas the DRM server obtains the location information.
 11. A systemaccording to claim 9, further comprising means for granting saidDRM-enabled search engine additional rights to enable the DRM-enabledsearch engine to decrypt and index additional DRM-encrypted contentmanaged by the DRM server.
 12. A method according to claim 5, the methodfurther comprising the preliminary steps of: receiving a request at theDRM system for access to the DRM-encrypted content from the crawler;validating the DRM identity of the crawler; and communicating locationinformation of the DRM-encrypted content to the DRM-enabled searchengine.
 13. A method according to claim 12, wherein the DRM systemcomprises a DRM server that communicates the location information to theDRM-enabled search engine as the DRM server obtains the locationinformation.
 14. A method according to claim 12, further comprisinggranting the DRM-enabled search engine additional rights to enable saidDRM-enabled search engine to decrypt and index additional DRM-encryptedcontent managed by the DRM server.
 15. A digital rights management (DRM)system, the system comprising: a DRM-enabled, search engine according toclaim 1, wherein the DRM system comprises a DRM server that maintainslocation information for DRM-encrypted content managed by the DRMserver, the DRM server being programmed to issue the DRM-enabled searchengine with the rights to access and index DRM-encrypted content managedby the DRM server and to direct the DRM-enabled search engine to thelocation of the DRM-encrypted content by the DRM server.
 16. A methodaccording to claim 5, further comprising: the DRM system issuingDRM-enabled the search engine with the rights to index DRM-encryptedcontent and directing the DRM-enabled search engine to the location ofthe DRM-encrypted content.
 17. A DRM-enabled search engine according toclaim 1, wherein the searchable index is a searchable free-text indexfor use in locating DRM-encrypted content using free-text searches. 18.A method according to claim 5, wherein the searchable index is asearchable free-text index for use in locating DRM-encrypted contentusing free-text searches.
 19. A DRM-enabled search engine according toclaim 1, wherein the crawler produces a searchable index of informationcontained within the DRM-encrypted content, the user of the searchengine being able to locate the DRM-encrypted content on the network byreferencing said information.
 20. A method according to claim 5, whereinthe parsing of the content produces a searchable index of informationcontained within the DRM-encrypted content, the user of the DRM-enabledsearch engine being able to locate the DRM-encrypted content on thenetwork by referencing said information.
 21. A search engine accordingto claim 1, wherein the crawler programmatically accesses theDRM-encrypted content to parse said content and produce a searchableindex.
 22. A method according to claim 5, wherein the step of parsingthe content further comprises programmatically accessing said content.